NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ReMoDeL-FPGA: Reconfifigurable Memory-centric Array Processor Architecture for Deep-Learning Acceleration on FPGA

Kabir, MD Arafat (August 2024, ScholarWorks@UARK https://scholarworks.uark.edu/)

Deep-Learning has become a dominant computing paradigm across a broad range of application domains. Different architectures of Deep-Networks like CNN, MLP, and RNN have emerged as the prominent machine-learning approaches for today’s application domains. These architectures are heavily data-dependent, requiring frequent access to memory. As a result, these applications suffer the most from the memory bottleneck of the von Neumann architectures. There is an imminent need for memory-centric architectures for deep-learning and big-data analytic applications that are memory intensive. Modern Field Programmable Gate Arrays (FPGAs) are ideal programmable substrates for creating customized Processor in/near Memory (PIM) accelerators. Modern FPGAs contain 100s of Mbits of dual-ported SRAM in the form of disaggregated, configurable Block RAMs (BRAMs). These BRAMs contain TB/s of available internal bandwidth. Unfortunately, developing FPGA-based accelerators for deep learning is not a simple task and demands the utilization of specialized tools provided by the FPGA vendors. It requires expertise in low-level hardware microarchitecture design. These are often not available to most researchers in the field of deep learning. Even with the ongoing improvements in High-Level Synthesis (HLS) tools, the requirement for hardware-specific design knowledge cannot be completely eliminated. This research developed a new reconfigurable memory-centric architecture and design approach that opens the advantages of FPGAs and Processor-in-Memory architecture to memory-intensive applications. Due to its high-performance and scalable memory-centric design, this architecture can deliver the highest speed and the lowest latency achievable from an FPGA overcoming the memory bottleneck.
more » « less
Full Text Available
IMAGine: An In-Memory Accelerated GEMV Engine Overlay

https://doi.org/10.1109/FPL64840.2024.00038

Kabir, Md Arafat; Kamucheka, Tendayi; Fredricks, Nathaniel; Mandebi, Joel; Bakos, Jason; Huang, Miaoqing; Andrews, David (September 2024, IEEE)

Full Text Available
The BRAM is the Limit: Shattering Myths, Shaping Standards, and Building Scalable PIM Accelerators

https://doi.org/10.1109/FCCM60383.2024.00045

Kabir, MD Arafat; Kamucheka, Tendayi; Fredricks, Nathaniel; Mandebi, Joel; Bakos, Jason; Huang, Miaoqing; Andrews, David (May 2024, IEEE)

Full Text Available
A Scalable In-Context Design and Extraction Flow for Heterogeneous 2.5D Chiplet-Package Co-Optimization

https://doi.org/10.1109/EPEPS51341.2021.9609155

Kabir, MD Arafat; Petranovic, Dusan; Peng, Yarui (October 2021, IEEE Conference on Electrical Performance of Electronic Packaging and Systems)

Full Text Available
Holistic Chiplet–Package Co-Optimization for Agile Custom 2.5-D Design

https://doi.org/10.1109/TCPMT.2021.3069724

Kabir, MD Arafat; Peng, Yarui (May 2021, IEEE Transactions on Components, Packaging and Manufacturing Technology)
null (Ed.)
Full Text Available
Cross-Boundary Inductive Timing Optimization for 2.5D Chiplet-Package Co-Design

https://doi.org/10.1145/3453688.3461505

Kabir, MD Arafat; Petranovic, Dusan; Peng, Yarui (June 2021, Great Lakes Symposium on VLSI)
null (Ed.)
Full Text Available
Holistic and In-Context Design Flow for 2.5D Chiplet-Package Interaction Co-Optimization

https://doi.org/10.1109/VLSI-DAT52063.2021.9427353

Kabir, MD Arafat; Hung, Weishiun; Ho, Tsung-Yi; Peng, Yarui (April 2021, International Symposium on VLSI Design, Automation and Test (VLSI-DAT))
null (Ed.)
Full Text Available
Holistic 2.5D Chiplet Design Flow: A 65nm Shared-Block Microcontroller Case Study

https://doi.org/10.1109/SOCC49529.2020.9524798

Kabir, MD Arafat; Peng, Yarui (September 2020, IEEE International System-on-Chip Conference)

Full Text Available
Coupling extraction and optimization for heterogeneous 2.5D chiplet-package co-design

https://doi.org/10.1145/3400302.3415718

Kabir, MD Arafat; Petranovic, Dusan; Peng, Yarui (November 2020, International Conference On Computer Aided Design (ICCAD))

Full Text Available
Chiplet-Package Co-Design For 2.5D Systems Using Standard ASIC CAD Tools

https://doi.org/10.1109/ASP-DAC47756.2020.9045734

Kabir, MD Arafat; Peng, Yarui (January 2020, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC))

Chiplet integration using 2.5D packaging is gaining popularity nowadays which enables several interesting features like heterogeneous integration and drop-in design method. In the traditional die-by-die approach of designing a 2.5D system, each chiplet is designed independently without any knowledge of the package RDLs. In this paper, we propose a Chip-Package Co-Design flow for implementing 2.5D systems using existing commercial chip design tools. Our flow encompasses 2.5D-aware partitioning suitable for SoC design, Chip-Package Floorplanning, and post-design analysis and verification of the entire 2.5D system. We also designed our own package planners to route RDL layers on top of chiplet layers. We use an ARM Cortex-M0 SoC system to illustrate our flow and compare analysis results with a monolithic 2D implementation of the same system. We also compare two different 2.5D implementations of the same SoC system following the drop-in approach. Alongside the traditional die-by-die approach, our holistic flow enables design efficiency and flexibility with accurate cross-boundary parasitic extraction and design verification.
more » « less
Full Text Available

Search for: All records